Biometrics is the science of identifying an individual based on their intrinsic anatomical or behavioural characteristics, such as fingerprints, face, iris, gait, and voice. Iris recognition is one of the most successful methods because it exploits the rich texture of the human iris, which is unique even for twins and does not degrade with age. Modern approaches to iris recognition utilize deep learning to segment the valid portion of the iris from the rest of the eye, so it can then be encoded, stored and compared. This paper aims to improve the accuracy of iris semantic segmentation systems by introducing a novel data augmentation technique. Our method can transform an iris image with a certain dilation level into any desired dilation level, thus augmenting the variability and number of training examples from a small dataset. The proposed method is fast and does not require training. The results indicate that our data augmentation method can improve segmentation accuracy up to 15% for images with high pupil dilation, which creates a more reliable iris recognition pipeline, even under extreme dilation.
translated by 谷歌翻译
这项研究提出了一种新的数据库和方法,以检测由于酒精,药物消耗和昏昏欲睡而导致的警报条件的减少,而近亲(NIR)眼球周围眼部图像。该研究的重点是确定外部因素对中枢神经系统(CNS)的影响。目的是分析这如何影响虹膜和学生运动行为,以及是否可以用标准的IRIS NIR捕获装置对这些更改进行分类。本文提出了修改的MobileNetV2,以对来自酒精/药物/嗜睡影响的受试者拍摄的虹膜NIR图像进行分类。结果表明,基于MobileNETV2的分类器可以在耐心等方面从饮酒和药物消耗后捕获的虹膜样品的不合适性条件,分别检测精度分别为91.3%和99.1%。嗜睡状况是最具挑战性的72.4%。对于属于FIT/UNFIT类的两类分组图像,该模型的准确度分别为94.0%和84.0%,使用的参数数量较小,而不是标准的深度学习网络算法。这项工作是开发自动系统以对“适合值班”进行分类并防止因酒精/吸毒和嗜睡而导致事故的生物识别应用程序迈出的一步。
translated by 谷歌翻译
删除身份证图像中的背景是远程验证系统的真正挑战,因为许多重新数字化图像存在杂乱的背景,照明条件差,失真和闭塞。 ID卡图像中的背景使分类器和文本提取困扰。由于缺乏用于研究的可用图像,该领域今天代表了计算机愿景中的一个开放问题。这项工作提出了一种使用ID卡的语义分割来删除背景的方法。最后,使用由45,007张图像组成的手动标记的数据集在实际操作中捕获的图像,其中包括来自三个国家(智利,阿根廷和墨西哥)的五种类型的ID卡,包括典型的演示攻击情景。该方法可以帮助改进常规身份验证或文档篡改检测系统中的以下阶段。根据MobileNet和DenSenet10探索了两种深入学习方法。使用MobileNet获得最佳结果,具有650万参数。智利身份证的平均交叉路口(IOO)在4,988张图像的私人测试数据集中为0.9926。来自智利,阿根廷和墨西哥的ID卡片图像的融合多国数据集的最佳成果达到了0.9911的IOU。所提出的方法是重量轻,足以用于移动设备上的实时操作。
translated by 谷歌翻译
We introduce M-VADER: a diffusion model (DM) for image generation where the output can be specified using arbitrary combinations of images and text. We show how M-VADER enables the generation of images specified using combinations of image and text, and combinations of multiple images. Previously, a number of successful DM image generation algorithms have been introduced that make it possible to specify the output image using a text prompt. Inspired by the success of those models, and led by the notion that language was already developed to describe the elements of visual contexts that humans find most important, we introduce an embedding model closely related to a vision-language model. Specifically, we introduce the embedding model S-MAGMA: a 13 billion parameter multimodal decoder combining components from an autoregressive vision-language model MAGMA and biases finetuned for semantic search.
translated by 谷歌翻译
Reinforcement Learning has emerged as a strong alternative to solve optimization tasks efficiently. The use of these algorithms highly depends on the feedback signals provided by the environment in charge of informing about how good (or bad) the decisions made by the learned agent are. Unfortunately, in a broad range of problems the design of a good reward function is not trivial, so in such cases sparse reward signals are instead adopted. The lack of a dense reward function poses new challenges, mostly related to exploration. Imitation Learning has addressed those problems by leveraging demonstrations from experts. In the absence of an expert (and its subsequent demonstrations), an option is to prioritize well-suited exploration experiences collected by the agent in order to bootstrap its learning process with good exploration behaviors. However, this solution highly depends on the ability of the agent to discover such trajectories in the early stages of its learning process. To tackle this issue, we propose to combine imitation learning with intrinsic motivation, two of the most widely adopted techniques to address problems with sparse reward. In this work intrinsic motivation is used to encourage the agent to explore the environment based on its curiosity, whereas imitation learning allows repeating the most promising experiences to accelerate the learning process. This combination is shown to yield an improved performance and better generalization in procedurally-generated environments, outperforming previously reported self-imitation learning methods and achieving equal or better sample efficiency with respect to intrinsic motivation in isolation.
translated by 谷歌翻译
Machine-Learned Likelihoods (MLL) is a method that, by combining modern machine-learning classification techniques with likelihood-based inference tests, allows to estimate the experimental sensitivity of high-dimensional data sets. We extend the MLL method by including the exclusion hypothesis tests and show that the addition of Kernel Density Estimators avoids the need to bin the classifier output in order to extract the resulting one-dimensional signal and background probability density functions. We first test our method on toy models generated with multivariate Gaussian distributions, where the true probability distribution functions are known. We then apply it to a case of interest in the search for new physics at the HL-LHC, in which a $Z^\prime$ boson decays into lepton pairs, comparing the performance of our method for estimating 95\% CL exclusion limits to the results obtained applying a binned likelihood to the machine-learning classifier output.
translated by 谷歌翻译
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
translated by 谷歌翻译
Specular microscopy assessment of the human corneal endothelium (CE) in Fuchs' dystrophy is challenging due to the presence of dark image regions called guttae. This paper proposes a UNet-based segmentation approach that requires minimal post-processing and achieves reliable CE morphometric assessment and guttae identification across all degrees of Fuchs' dystrophy. We cast the segmentation problem as a regression task of the cell and gutta signed distance maps instead of a pixel-level classification task as typically done with UNets. Compared to the conventional UNet classification approach, the distance-map regression approach converges faster in clinically relevant parameters. It also produces morphometric parameters that agree with the manually-segmented ground-truth data, namely the average cell density difference of -41.9 cells/mm2 (95% confidence interval (CI) [-306.2, 222.5]) and the average difference of mean cell area of 14.8 um2 (95% CI [-41.9, 71.5]). These results suggest a promising alternative for CE assessment.
translated by 谷歌翻译
人类利用先验知识来描述图像,并能够使其解释适应特定的上下文信息,即使在上下文信息和图像不匹配时,也可以在发明合理的解释的范围内。在这项工作中,我们提出了通过整合上下文知识来字幕Wikipedia图像的新颖任务。具体而言,我们制作的模型共同推理了Wikipedia文章,Wikimedia图像及其相关描述以产生上下文化的标题。特别是,可以使用类似的Wikimedia图像来说明不同的文章,并且所产生的标题需要适应特定的上下文,因此使我们能够探索模型的限制以调整标题为不同的上下文信息。该领域中的一个特殊挑战性的任务是处理量不多的单词和命名实体。为了解决这个问题,我们提出了一个预训练目标,掩盖了命名实体建模(MNEM),并表明与基线模型相比,此借口任务可以改善。此外,我们验证了Wikipedia中使用MNEM目标预先训练的模型可以很好地推广到新闻字幕数据集。此外,我们根据字幕任务的难度定义了两种不同的测试拆分。我们提供有关每种方式的作用和重要性的见解,并突出我们模型的局限性。接受时,代码,模型和数据拆分可公开可用。
translated by 谷歌翻译
在本文中,我们介绍了一个多语言场景文本视觉问题的框架,以零拍的方式处理新语言。具体来说,我们考虑场景文本视觉质量回答(STVQA)的任务,其中可以用不同的语言提出问题,并且不一定与场景文本语言保持一致。因此,我们首先引入了自然的步骤,朝着更广泛的版本的STVQA:MUST-VQA介绍。考虑到这一点,我们讨论了在受约束设置的两个评估方案,即IID和零照片,我们证明这些模型可以在零拍设置的标准杆上执行。我们进一步提供了广泛的实验,并显示了将多语言模型调整为STVQA任务的有效性。
translated by 谷歌翻译